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ABSTRACT 


In this paper, the combined charge difficulty evaluation and testees level for intellect appraisal are to find the 
employee performance. In testing the tests and task evaluation of different designation employee’s 
performance outcome result is difficult to rank the outcome result. Whereby higher officials evaluate and 
provide feedback on employee job performance, including steps to improve or redirect activities as need. 
Higher official documenting performance provides a basis for pay increases and promotions. In this project, 
higher official evaluates the lower official by assigning task and testees. The higher official will solve the 
difficulties of testing and ranking problem for appraisal. All these findings are useful to intelligence tests. Here 
the evaluation of each employee was calculated and view by the Bernoulli’s Distribution algorithm. 
Keywords-Task evaluation, employee appraisal, ranking of employees 


I. Introduction 


Tesnne the intelligence level of a human or an 


artificial intelligent system is an important yet hard 
problem. Since it is really hard to directly define 
intelligence, we usually indirectly characterize the 
intelligence level of the human or artificial intelligent 
systems according to their behaviors in some 
dedicated tests. 


For example, we often test the intelligence levels of 
children based on their performance in some standard 
tasks designed by human experts. However, such 
methods may not work for some artificial intelligent 
systems, since we do not have a standard comparison 
level fortasks 

In this paper, we study the testing tasks evaluation 
and testees ranking problem. Tasks may have different 
difficulty levels, and testees may have different 
capabilities; however, we do not know in such a 
scope. Usually, we setup a series of tasks (e.g., traffic 
sign detection, vehicle detection, and pedestrian 
detection) for the vehicle and check whether it can 
successfully finish these tasks in time. However, we 
cannot accurately determine the difficulties of each 


task. The vehicles that take part in the tests may also 
have noticeably various capabilities, so that we cannot 
judge the difficulty of each task simply by their 
performance 


Our objective is to simultaneously determine the 
relative difficulty level of each testing task and the 
relative capability of every testee, purely based on the 
test outcome of every testees on each task. Moreover, 
we assume that testees may have a probability to pass 
a certain task so as to allow certain uncertainty. 


To solve this problem, we have designed two models 
to solve this problem. One model assumes that the test 
out- come follows a certain Bernoulli distribution, 
while the other assumes that the test outcome follows 
a certain Bernoulli distribution with the beta 
distribution-type a priori knowledge. We formulate 
the problem as likelihood estimation problems. And 
propose coordinate descent algorithm to solve the 
Para- meters of the studied distributions. We show 
that the beta distribution-type a priori knowledge is 
useful, when we only carry out a limited number of 
tests due to time and financial budgets. 


To better present our findings, the rest of this paper is 
arranged as follows. Sections II and HI present the 
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problems and give two statistical learning models as 
well as the solving algorithm. Section IV provides an 
empirical numerical study. Section V discusses how to 
extend this model for more general cases. 
Finally,Section VI provides the conclusion. 


This model is somewhat like the logistic principal 
component analysis model proposed .However, our 
model adds quite different assumptions on the latent 
structure of the statistical model. 


Clearly, to derive an estimation of pand q js 
equivalent to finding a parameter set of pjand q that 
maximize L. How- ever, it is not a convex 
optimization problem for maximum likelihood 
estimation (MLE) problem 


Summarizing the above analysis, we propose the 
follow- ing coordinate descent algorithm. The 
numerical tests show that this coordinate descent 
algorithm converges to the local optimal solution 
quickly. To increase the probability to hit the global 
optimal solution, we can run the following algorithm 
for several times with different initialvalues 
Therefore, each iteration can increase or maintain the 
lower bound estimation of the log-likelihood. 
Consequently, the lower bound of the log-likelihood 
will finally converge to a local maximum 


Il. TESTINGRESULTS 


Here, we give a numerical example for intelligent 
vehicles testing In November 2017, 11 teams took 
part in the vehicle detection competition supported by 
the National Natural Science Foundation of China. 
The competition consists of 48 vehicles to detect, and 
the testing results are shown in Fig. 1. Each column 
represents for a testees while each row represents for 
a task vehicle. Then, the element in the j th row and i 
th column stands for the binary detection result, where 
1 means passed and 0 means failed. (worked out 46 
tasks) is very close to team 7 (worked out 47 tasks), 
their capability differences should not be omitted. 
Intuitively, the most difficult task,i.e.,task7,and the 
most capable testee, i.e., team 7, located in the top 
right of éhch figure, should be distinguishable from 
other tasks and testees, especially nearby ones. Such 
distinguish ability implies the curves of estimated and 
sorted p; and q ; should not be flat contrarily, the 
Bernoulli distribution with the beta distribution- type 
a priori knowledge model appropriately emphasizes 
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this point and, meanwhile, gives the prominence to 
the difficulty level of task 7, as shown in Fig. 3. 
Therefore, we can conclude that, when we have 
limited times of tests for a single task, the beta 
distribution-type a priori knowledge is needed 


Il. FURTHERDISCUSSION 
Model Estimation Considering MissingData 


The afore mentioned statistical learning model (1) 
provides a basis for simultaneous tasks evaluation and 
testees ranking. However, in many situations, 


= Estimated, sorted and normalized a 
— Simple ranks of testees 


úi = Estimated q 
Simple ranks of tasks 


Value 


6 
Index 


(a) 


(a)Estimated 
and sorted p;,and the simple ranks of the testees 


(b) Estimated and sorted q j , and the simple ranks of the testees. 


Do not have enough time and money to carry out 
tests on each task for every testee.To handle the 
parameter estimation with respect to missing 
observations, were sort to the classic expectation 
maximization(EM) algorithm Suppose that the 


available part of x;jcan be written as x °PS and the 


missing part of x;j can be written as x MISS, The 
detailed algorithm is given as follows. It is easy to 
prove that the above algorithm is a standard EM 
algorithm can converge to a local optimal solution 
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quickly. We need to run the above algorithm for 
several times with different initial values to increase 
the probability to hit the global optimal solution. The 
larger the missing ratio, the larger is the number of 
running times. Similarly, we can handle the cases 
when the beta distribution-type a prior knowledge is 
considered 


Model Estimation Considering Multiple Experiments 


The above models assume that every testee runs each 
test once. However, we can further relax this 
assumption and allow the i th testee to run the j th 
testing task for y;;times and record the number of 
successfully finish. We can still derive an estimation 
of pand q jby finding a parameter set of pand q ;to 
maximize the likelihood function 


IV. CONCLUSION 


In this paper, we discuss how to simultaneously 
determine the difficulty levels of testing tasks and the 
capabilities of testees, especially when heterogeneous 
tasks and testees are considered. We propose several 
Statistical learning models to handle such problems 
and allow testees to have some certain uncertainties in 
finishing tasks. The numerical test results verify the 
effectiveness of the proposed learning models and 
also the solving algorithms. All these findings are 
useful to designs of tests for artificial intelligent 
systems. 


It should be pointed out that the aforementioned 
algorithm only provides an up-to-time estimation of 
the tasks and testees, because the capability of a testee 
may grow with time and the relative difficulty of a 
task may also vary accordingly. Similar to the 
Elaborating system for chess and Go games [6], we 
need to continuously design new and tougher tasks to 
keep a proper understanding of the testees. 
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